43 research outputs found

    Gene Family Abundance Visualization based on Feature Selection Combined Deep Learning to Improve Disease Diagnosis

    Get PDF
    Advancements in machine learning in general and in deep learning in particular have achieved great success in numerous fields. For personalized medicine approaches, frameworks derived from learning algorithms play an important role in supporting scientists to investigate and explore novel data sources such as metagenomic data to develop and examine methodologies to improve human healthcare. Some challenges when processing this data type include its very high dimensionality and the complexity of diseases. Metagenomic data that include gene families often have millions of features. This leads to a further increase of complexity in processing and requires a huge amount of time for computation. In this study, we propose a method combining feature selection using perceptron weight-based filters and synthetic image generation to leverage deep-learning advancements in order to predict various diseases based on gene family abundance data. An experiment was conducted using gene family datasets of five diseases, i.e. liver cirrhosis, obesity, inflammatory bowel diseases, type 2 diabetes, and colorectal cancer. The proposed method provides not only visualization for gene family abundance data but also achieved a promising performance level

    Interestingness Measures for Association Rules in a KDD Process : PostProcessing of Rules with ARQAT Tool

    Get PDF
    This work takes place in the framework of Knowledge Discovery in Databases (KDD), often called "Data Mining". This domain is both a main research topic and an application ¯eld in companies. KDD aims at discovering previously unknown and useful knowledge in large databases. In the last decade many researches have been published about association rules, which are frequently used in data mining. Association rules, which are implicative tendencies in data, have the advantage to be an unsupervised model. But, in counter part, they often deliver a large number of rules. As a consequence, a postprocessing task is required by the user to help him understand the results. One way to reduce the number of rules - to validate or to select the most interesting ones - is to use interestingness measures adapted to both his/her goals and the dataset studied. Selecting the right interestingness measures is an open problem in KDD. A lot of measures have been proposed to extract the knowledge from large databases and many authors have introduced the interestingness properties for selecting a suitable measure for a given application. Some measures are adequate for some applications but the others are not. In our thesis, we propose to study the set of interestingness measure available in the literature, in order to evaluate their behavior according to the nature of data and the preferences of the user. The ¯nal objective is to guide the user's choice towards the measures best adapted to its needs and in ¯ne to select the most interesting rules. For this purpose, we propose a new approach implemented in a new tool, ARQAT (Association Rule Quality Analysis Tool), in order to facilitate the analysis of the behavior about 40 interest- ingness measures. In addition to elementary statistics, the tool allows a thorough analysis of the correlations between measures using correlation graphs based on the coe±cients suggested by Pear- son, Spearman and Kendall. These graphs are also used to identify the clusters of similar measures. Moreover, we proposed a series of comparative studies on the correlations between interestingness measures on several datasets. We discovered a set of correlations not very sensitive to the nature of the data used, and which we called stable correlations. Finally, 14 graphical and complementary views structured on 5 levels of analysis: ruleset anal- ysis, correlation and clustering analysis, most interesting rules analysis, sensitivity analysis, and comparative analysis are illustrated in order to show the interest of both the exploratory approach and the use of complementary views

    Clustering interestingness measures with positive correlation

    No full text
    International audienceno abstrac

    Discovering the Stable Clusters between Interestingness Measures

    No full text
    International audienc

    ARQAT: An Exploratory Analysis Tool For Interestingness Measures

    No full text
    Finding interestingness measures to evaluate association rules has become an important knowledge quality issue in KDD. Many interestingness measures may be found in the literature, and many authors have discussed and compared interestingness properties in order to help choose the best measures for a given application. As interestingness depends both on the data structure and on the decision-maker’s goals, some measures may be relevant in some context, but not in others. Therefore, it is necessary to design new contextual approaches in order to help the decision-maker to select the best interestingness measures. In this paper, we present ARQAT a new tool to study the specific behavior of a set of 34 interestingness measures in the context of a specific dataset and in an exploratory data analysis perspective. The tool implements 14 graphical and complementary views structured on 5 levels of analysis: ruleset analysis, correlation and clustering analysis, best rules analysis, sensitivity analysis, and comparative analysis. The tool is described and illustrated on the mushroom dataset in order to show the interest of both the exploratory approach and the use of complementary views

    Mesures d'intérêts pour règles d'association dans un processus d'ECD (post-traitement des règles avec l'outil ARQAT (résumé))

    No full text
    Ce travail s'insère dans le cadre de l'extraction de connaissances dans les données (ECD), souvent dénommé "fouille de données". Ce domaine de recherche multidisciplinaire offre également de nombreuses applications en entreprises. L'ECD s'attache à la découverte de connaissances cachées au sein de grandes masses de données. Parmi les modèles d'extraction de connaissances disponibles, celui des règles d'association est fréquemment utilisé. Il offre l'avantage de permettre une découverte non supervisée de tendances implicatives dans les données, mais, en retour, délivre malheureusement de grandes quantités de règles. Son usage nécessite donc la mise en place d'une phase de post-traitement pour aide l'utilisateur final, un décideur expert des données, à réduire la masse de règles produites. Une manière de réduire la quantité de règles consiste à utiliser des indicateurs numériques de la qualité des règles, appelés "mesures d'intérêts". La littérature propose de nombreuses mesures de ce type, et étudie leurs propriétés. Cette thèse se propose d'étudier la panoplie de mesures d'intérêts disponibles afin d'évaluer leur comportement en fonction d'une part, de la nature des données et d'autre part, des préférences du décideur. L'objectif final étant de guider le choix de l'utilisateur vers les mesures les mieux adaptées à ses besoins et in fine de sélectionner les meilleures règles. A cette fin, nous proposons une approche novatrice implémentée dans un nouvel outil, ARQAT (Association Rule Quality Analysis Tool), afin de faciliter l'analyse du comportement des 40 mesures d'intérêt recensées. En plus de statistiques élémentaires, l'outil permet une analyse poussée des corrélations entre mesures à l'aide de graphes de corrélation s'appuyant sur les coefficients proposés par Pearson, Spearman et Kendall. Ces graphes sont également utilisés pour l'identification de clusters de mesures similaires. En outre, nous avons proposé une série d'études comparatives sur les corrélations entre les mesures d'intérêt sur plusieurs jeux de données. A l'issue de ces études, nous avons découvert un ensemble de correlations peu sensibles à la nature des données utilisées, que nous avons appelées corrélations stables. Enfin, nous présentons 14 graphiques et vues complémentaires structures en 5 niveaux d'analyse : l'analyse de jeu de règles, l'analyse de corrélation et de clustering, l'analyse des meilleures règles, l'analyse de sensibilité, et l'analyse comparative. Au travers d exemples nous montrons l'intérêt de l'approche exploratoire et de l'utilisation des vues complémentaires.This work takes place in the framework of Knowledge Discovery in Databases (KDD), often called "Data Mining". This domain is both a main research topic and an application field in companies. KDD aims at discovering previously unknown and useful knowledge in large databases. In the last decade many researches have been published about association rules, which are frequently used in data mining. Association rules, which are implicative tendencies in data, have the advantage to be an unsupervised model. But, in counter part, they often deliver a large number of rules. As a consequence, a postprocessing task is required by the user to help him understand the results. One way to reduce the number of rules - to validate or to select the most interesting ones - is to use interestingness measures adapted to both his/her goals and the dataset studied. Selecting the right interestingness measures is an open problem in KDD. A lot of measures have been proposed to extract the knowledge from large databases and many authors have introduced the interestingness properties for selecting a suitable measure for a given application. Some measures are adequate for some applications but the others are not. In our thesis, we propose to study the set of interestingness measure available in the literature, in order to evaluate their behavior according to the nature of data and the preferences of the user. The final objective is to guide the user's choice towards the measures best adapted to its needs and in fine to select the most interesting rules. For this purpose, we propose a new approach implemented in a new tool, ARQAT (Association Rule Quality Analysis Tool), in order to facilitate the analysis of the behavior about 40 interestingness measures. In addition to elementary statistics, the tool allows a thorough analysis of the correlations between measures using correlation graphs based on the coefficients suggested by Pearson, Spearman and Kendall. These graphs are also used for identifying the clusters of similar measures. Moreover, we proposed a series of comparative studies on the correlations between interestingness measures on several datasets. We discovered a set of correlations not very sensitive to the nature of the data used, and which we called stable correlations. Finally, 14 graphical and complementary views structured on 5 levels of analysis: ruleset analysis, correlation and clustering analysis, most interesting rules analysis, sensitivity analysis, and comparative analysis are illustrated in order to show the interest of both the exploratory approach and the use of complementary views.NANTES-BU Sciences (441092104) / SudocNANTES-BU Technologie (441092105) / SudocSudocFranceF

    Extracting representative measures for the post-processing of association rules

    No full text
    A paraîtreInternational audienceno abstrac

    Co-modeling: An Agent-Based Approach to Support the Coupling of Heterogeneous Models

    No full text
    International audienc

    A Developing Method for Distributed Sensing Systems

    No full text
    International audienc
    corecore